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The present invention relates to cathepsin L in biologically active form, methods of 
making cathepsin L in E. coli> cathepsin L in crystalline form, use of the X-ray structure 
5 atomic co-ordinates in rational drug design and drugs obtained thereby. 

Cathepsin L is a thiol protease that plays an important role in the degradation of 
proteins within the lysosomes and in the extracellular matrix. It occurs in at least three 
forms: prepro-; pro- and; mature cathepsin L. Furthermore, mature cathepsin L can be 
"clipped" through proteolysis to yield heavy and light chains. The pro and prepro forms are 
10 biologically inactive. 

Besides its role in protein turnover, cathepsin L has been implicated in bone 
remodelling, glomerulonephritis, rheumatoid arthritis, antigen processing and tumour 
metastasis. The search for drugs that intervene in these diseases has been impeded by lack 
of knowledge as to the three-dimensional structural determinants of the specificity of 
15 cathepsin L. These can only be accurately ascertained through the technique of X-ray 
crystallography, which involves the analysis of X-ray diffraction of a crystalline sample. 

Obtaining a crystalline sample suitable for the structural analysis of cathepsin L has 
eluded all researchers to date. The supply of pure active protein for crystallisation trials has 
been a major stumbling block in the growth of large crystals. Whilst not wishing to be 
20 bound by theoretical considerations the presence of a free cysteine residue at the active site 
(which can cause mis-matched disulphide bond pairing) probably contributes to making the 
folding process complex. The sensitivity of the enzyme to extremes of pH also makes the 
enzyme difficult to handle because it is often desirable to work with the enzyme at 
extremes of pH such as for example in refolding experiments. 
25 Abbreviations: 



CM carboxymethyl 

dOT dissolved oxygen tension 

NHMec 7-amido-4-methyl-coumarin 

MPD methylpentandiol 

30 NT A nitrilo-triacetic acid 

OTR oxygen transfer rate 
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5 VVM 



OUR 



PAGE 



PCR 



Z 



SDS 



oxygen uptake rate 
polyacrylamide gel electrophoresis 
polymerase chain reaction 
sodium dodecyl sulphate 
volume volume per minute 
benzyloxycarbonyl 



In an early report Smith and Gottesman (1989) in J. Biol. Chem. 264^20487-20495, 
described the expression of human procathepsin L in E. coli as an insoluble product in 
inclusion bodies. They described the isolation and subsequent solubilisation of inclusion 

10 bodies followed by folding then activation (by acid treatment) of the enzyme to produce 
active enzyme in poor yield (<1% by activity). The authors reported that the majority of 
the product in the renaturation extract was insoluble and by implication misfolded. After 
purification by Sephadex G-75 chromatography the specific activity (as measured by the Z- 
Phe-Arg-NHMec assay described below) of the purified folded product was low (22,000 

15 nmol/min/mg) implying that only a portion of the final pure product was in the correctly 
folded state. 

Dolinar et al ( (1995) Biol. Chem. Hoppe-Seyler, 22£: 385-388 ) have also 
described the expression of procathepsin L in E. coli as an insoluble product in inclusion 
bodies, followed by extraction and solubilisation of the inclusion bodies then folding of 

20 procathepsin L followed by acid activation. Despite improvements in yield to 4.4%, the 
authors calculate that only 30% of their folded cathepsin L product is active, implying that 
70% of the product is inactive, presumably still misfolded. The misfolded enzyme would 
act to frustrate the crystallisation of the enzyme as the misfolded enzyme would interfere 
with the build up of ordered arrays of molecules that constitute the formation of the crystal. 

25 The present invention is based on overcoming several technical hurdles: (1) we 

have secreted procathepsin L into E. coli periplasm in soluble form; (2) we have purified 
active cathepsin L from the growth medium in milligram amounts at high specific activity 
> 100,000 nmoles/min/mg; (3) we have devised a process for the production of correctly 
folded molecules from soluble incorrectly folded molecules; (4) we have obtained 

30 cathepsin L in a crystalline form; (5) collected an X-ray diffraction data set from the 
crystals and; (6) we have determined the 3 -dimensional structure of cathepsin L from the 
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X-ray diffraction data. This involved using papain as a model in molecular replacement 
procedures. The structure has been refined to give an accurate model which is used to 
understand the enzyme specificity and forms the basis of rational drug design. 

According to one aspect of the present invention there is provided protein cathepsin 
5 L in which at least 40 % of the protein is correctly folded in a biologically active 
conformation. Preferably at least 50 % of the protein is correctly folded, more preferably 
at least 60 % of the protein is correctly folded, more preferably at least 70 % of the 
protein is correctly folded, more preferably at least 80 % of the protein is correctly folded, 
more preferably at least 90 % of the protein is correctly folded and especially at least 95 
10 % of the protein is correctly folded. 

The term "cathepsin L" as used herein (unless otherwise apparent from the context 
used) includes analogues or variants such as for example cathepsin L - (His)g where the 
(His)g tag is used to assist purification but other analogues are contemplated especially 
conservative analogues which would not unduly alter the biological properties of the 
15 protein. Generally cathepsin L is in isolated form meaning it is at least partially purified 
from its naturally occuring state. 

The terms "Cathepsin L-6His" and "cathepsin L - (His)6 M refer to cathepsin L with 
a tag of 6 histidine amino acids added to its C-terminus. 

According to another aspect of the present invention there is provided cathepsin L 
20 at a specific activity of at least 40,000 nmoles/min/mg. Specific activity is determined 
using the Z-Phe-Arg-NHMec assay described below. Preferably the specific activity is at 
least 50,000 nmoles/min/mg, more preferably the specific activity is at least 60,000 
nmoles/min/mg, more preferably the specific activity is at least 70,000 nmoles/min/mg, 
more preferably the specific activity is at least 80,000 nmoles/min/mg, more preferably 
25 thespecific activity is at least 90,000 nmoles/min/mg and especially the specific activity is 
at least 100,000 nmoles/min/mg. 

According to another aspect of the present invention there is provided a method of 
making cathepsin L which comprises directed expression of procathepsin L encoded on an 
expression vector into the periplasm of E.coli cultured in a growth medium. Preferably the 
30 E.coli is cultured at a temperature of 15-30°C and especially at 25°C. Preferably the 
expression vector comprises inducible expression control and the growth medium 



comprises a concentration of inducer optimised for maximal soluble expression of 
cathepsin (this is a lower concentration, optimised for example by titration, than the 
concentration of inducer required for maximal expression of cathepsin). Culture of E.coli 
at low temperature (15-30°C ) gives a slow rate of cathepsin production which has the 
5 advantage of producing higher yields of correctly folded soluble cathepsin; use of a low 
concentration of inducer enhances this advantageous effect. Preferably soluble cathepsin L 
is collected from growth medium and activated by removal of the pro sequence and 
clipping the cathepsin into its 2-chain form.. 

According to another aspect of the present invention there is provided a secretion 
10 vector for directed expression of cathepsin L into the periplasm of E.coli. 

According to another aspect of the present invention there is provided an Kcoli host 
comprising a secretion vector for directed expression of procathepsin L into the host 
periplasm. 

According to another aspect of the present invention there is provided E. coli MSD 
15 2148 (MSD 213 pZen 1677) deposited under the Budapest Treaty as NCIMB accession no 
40773 on 1 1th October 1995 with the National Collection of Industrial & Marine Bacteria, 
23 St Machar Drive, Aberdeen AB2 1RY, Scotland, United Kingdom. 

According to another aspect of the present invention there is provided cathepsin L in 

0 

crystalline form. Preferably the crystals possess a monoclinic space group P2\ 9 with unit 
20 cell dimensions a = 46.23, b = 49.38, c = 49.25 A and |J = 1 13.45°. Preferably the crystals 

are wedge-shaped crystals and are preferably about 0.1 mm 3 . Preferably the crystals 

essentially have the atomic coordinates set out in Example 4. 

According to another aspect of the present invention there is provided X-ray 

coordinates of cathepsin L. Preferably the coordinates are substantially as set out in 
25 Example 4. Preferred X-ray coordinates are of the active site of cathepsin Las defined in 

Example 4 in residues 17-26, 29, 61-75, 1 11-118, 132-145, 158-166, 183-189 & 209-214. 
According to another aspect of the present invention there is provided the use of the 

3-dimensional structure of cathepsin L as determined from X-ray diffraction data in 

rational drug design. Preferred methods are described in Example 3 such as any of the 
30 following: 
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scarching real and virtual compound databases for potential drugs; computational growth 
of ligands in the active site; and collecting X-ray diffraction data from crystals comprising 
potential drug. 

According to another aspect of the present invention there is provided a novel 
5 inhibitor of cathepsin L determined by rational drug design using X-ray diffraction data of 
cathepsin L. The cathepsin L catalytic residues, ie those who play an active role in 
catalysis, 

are Cys25, Hisl63, Glnl9, Asnl 83, and, in a structural role, Trpl85. This however, 
is a configuration conserved in almost all cysteine proteases, and we need to look beyond 
10 the active site when carrying out drug design specific to cathepsin L. In fact, the surface of 
the entire active site cleft can be important. The following residue ranges form the active 
site cleft and therefore are most likely to be involved in drug design: 17-26, 29, 61-75, 1 1 1- 
118, 132-145, 158-166, 183-189,209-214. Note that any one drug would only interact 
with a subset of these. 

15 The invention will now be illustrated by the following non-limiting examples in 

which: 

SEQ ID NO: 7 illustrates a cDNA nucleotide sequence of cathepsin L in which 289-339 

represents signal peptide, 289-1290 represents pro-cathepsin L and, 340-1290 represents 

mature cathepsin L (Gal S. 1988, Biochem. J. 251, 303-306); 
20 SEQ ID NO: 8 illustrates an amino acid sequence of cathepsin L in which 1-17 represents 

signal peptide, 1 8-1 13 represents pro peptide, 1 14-288 represents heavy chain and 292-333 

represents light chain (Ritonja 1988, FEBS Letters 22&, 341-345; Mason 1986, Biochem. J. 

242, 373-377; Joseph 1987, Nucleic Acids Research 15, 3186) and; 

Figure 1 illustrates a cloned PCR product showing restriction sites; 
25 Figure 2 illustrates pDP480; 

Figure 3 illustrates pCRII/cathepsin L-6-His and; 

Figure 4 illustrates pDP483 (pZEN1677). 



30 



-6- 

Example 1 - Producti n of recombinant human cathepsin L-6His 
1 . 1 Cloning of Cathepsin L-6His 

The coding sequence for full length preprocathepsin L was generated from a human 
epithelial cell cDNA library by PCR using oligonucleotides complementary to the 5' and 3' 

5 ends: 

V . V 01i ? nmieleotide: 

GCAGTAAGATATGAATCCTACACT (SEQ ID NO: 1) 
CATCACCGTCCACAGCTCACACAG (SEQ ID NO: 2) 

10 

The PCR product generated was cloned into the PCR cloning vector pCR II 
(InVitrogen). A diagram of the cloned PCR product showing relevant restriction sites is 
shown in Figure 1. This construct formed the starting material for generation of the Ecoli 
secretion vector construct. 
15 1.2 Generation of E.coli procathepsin L-6His secretion vector 

The vector used for secretion of Cathepsin L in E.coli was pAG170. 
In pAG170, sequences to be expressed are cloned downstream of the Envinia carotovora 
pel B secretory leader sequence under control of the ara B/C promoter from Salmonella 
typhimurium. 

20 For cloning into pAGl 70, a PCR product was generated incorporating the sequence 

from the start of the pro region to the internal EcoR I site of procathepsin L, using the 
initial Cathepsin L construct above as the target DNA . The 5'-3' oligonucleotide also 
introduced a Nco 1 restriction site. The 3'-5' oligonucleotide was complementary to 
sequence encompassing the internal EcoR I site. 

25 

y-Tnlignnucleotide: 
Ncol 

GATGACILAJIKiCGACTCTAACATTTGATCACAG (SEQ ID NO: 3) 



30 T- s ' nlignnucleotide: 
EcoRl 
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CTGCCTQAAHCTTCACTGGTCATGTCTC (SEQ ID NO: 4) 

The PCR product generated was digested with Nco I and EcoR I and cloned into 
Nco I / EcoR I digested pAG170. A clone with correct sequence was identified (as pDP 

5 480) and is shown diagramatically in Figure 2. The 3' end of procathepsin L from an 
internal Nde I site to the end of the coding sequence was then generated by PCR. The 5* - 3' 
oligonucleotide encompassed the internal Nde I site . The 3' - 5' oligonucleotide was 
designed so as to introduce a 6 histidine residues (for purification purposes) immediately 
upstream of the translational stop codon as well as EcoR I and Hind III restriction sites 

10 downstream of the coding sequence. 

V - V Oligonucleotide: 
Nde I 

CTATCCAIAIQAGGCAACAGAAGAATCCTG (SEQ ID NO: 5) 

15 

T- V Oligonucleotide: 

Hind III EcoR I stopcodons 6 His tag 
G AC AAQCH QAATTC TTA TTA OTP A TGGTG ATOGTGGTG 
CACAGTGGGGTAGCTGGCTG (SEQ ID NO: 6) 

20 

The PCR product obtained was digested with Nde I and Hind III and cloned back 
into Nde I / Hind III digested pCR II preprocathepsin L to replace original sequence. The 
sequence of the 6-His tagged 3' Nde I - EcoRI fragment was confirmed (including 
sequence upstream of the Nde I site). A diagram of pCRII/Cathepsin L-6His is shown in 
25 Figure 3. 

To complete the cloning of procathepsin L into the secretion vector, the fragment of 
procathepsin L-6His from the internal EcoR I site to the EcoR I site at the 3' end was 
excised from the pCR II construct and cloned into EcoR I digested pDP480. A clone of the 
correct orientation was identified. This generates a tetracycline resistant plasmid encoding 
30 procathepsin L-6His downstream of an in-frame pel B secretion leader, whose expression 
is under the control of an ara B/C promoter / regulator cassette.This construct was initially 
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designated pDP 483 and is shown diagramatically in Figure 4. The construct was 

designated pZen 1677 (also known as pICI 1677; NCIMB 40773). 

13 Fermentation process for procathepsin L-6His 

Rcoli strain MSD 213 was transformed with plasmid pZen 1677 and the resultant 
5 strain MSD 2148 (MSD 213 pZen 1677; NCIMB No 40773) stored as a glycerol stock at - 

80°C. An aliquot of MSD 213 pZen 1677 was streaked onto agar plates of L-tetracycline to 

separate single colonies after overnight growth at 37°C. A single colony of MSD 213 pZen 

1677 was removed and resuspended in a 10ml L-tetracycline broth and lOOul immediately 

inoculated into each of seven 250ml Erlenmeyer flasks containing 75ml of L-tetracycline 
10 broth. After growth for 15h at 37°C on a reciprocating shaker the contents of the flasks 

were pooled and used to inoculate a single fermenter containing 15L of the growth medium 

described in the Table below. 

Table - Growth medium 



Component 


Concentration g/L 
detonized water 


Potassium dihydrogen orthophosphate 


3.0 


di-Sodium hydrogen orthophosphate 


6.0 


Sodium chloride 


0.5 


Casein hydrolysate (Oxoid L.41) 


2.0 


Ammonium sulphate 


10.0 


Glycerol 


35.0 


Yeast Extract (Difco) 


20.0 


Magnesium sulphate 7-hydrate 


0.5 


Calcium chloride 2-hydrate 


0.03 


Thiamine 


0.008 


Iron sulphate 7-hydrate/Citric acid 


0.04/0.02 


Trace element solution (TES)* 


0.5ml/L 


Tetracycline 


(lOmg/L) 



15 



•Trace metal solution (TES) 
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Component 


mg/lOml deionized 
water 


A1C1 3 .6H 2 0 


2.0 


CoCl 2 .6H 2 0 


O.g 


KCKS0 4 )2.12H20 


0.2 


CuCl 2 .2H 2 0 


0.2 


H3BO3 


0.1 


KI 


2.0 


MnS0 4 .H 2 0 


2.0 


NiS0 4 .6H 2 0 


0.09 


Na 2 Mo0 4 .2H 2 0 


0.4 


ZnS0 4 .7H 2 0 


0.4 



The fermentation was carried out at a temperature of 37°C and pH, controlled by 
automatic addition of 6M sodium hydroxide and 2M sulphuric acid solutions, of pH 6.7. 

5 The dissolved oxygen tension (dOT) set-point was 50% air saturation and was controlled 
by automatic adjustment of the fermenter stirrer speed. Air flow to the fermenter was 
20L/min corresponding to 1 .33 volume volume per minute (WM). A solution of yeast 
extract (stock 900g/4L) was fed into the fermenter at a rate of 1 90ml/h from 4,5h post 
inoculation. When the culture OD550 reached ca. 40-45 (6h post inoculation), the 

10 fermentation temperature was decreased to 25°C and the fermentation continued at this 
temperature under the conditions described above for a further Ih. 

When the culture OD550 reached ca. 50-55 (7h post inoculation) procathepsin L- 
6His expression/secretion was induced by adding 75g L-arabinose to the fermenter (1 50ml 
of 50% stock). The fermentation was continued under these conditions until the supply of 

15 carbon source (glycerol) in the fermentation became exhausted (9h post inoculation) 
leading to a rapid rise in the dOT from 50% air saturation. At this point, a feed containing 
glycerol (714 g/L) and ammonium sulphate (143 g/L) was fed pumped into the fermenter. 
The rate at which this feed was supplied was adjusted to restrict the bacterial oxygen 
uptake rate (OUR) to approximately 80% of the fermenters maximum oxygen transfer rate 

20 (OTR), under the conditions described, whilst first returning and then maintaining the dOT 
at 50% air saturation. The fermentation was continued under these conditions until about 



-10- 

56h post fermenter inoculation when the culture was harvested by transferring aliquots of 
the fermenter contents into 1L centrifuge bottles. 

The spent medium was separated from the bacterial cells by centrifugation in a 
Sorvall™ RC-3B centrifuge (7000x g, 4°C, 30min). Accumulation of procathepsin L-6His 
5 in the spent growth medium, and periplasm (extracts prepared using osmotic shock) was 
determined using SDS-PAGE followed by western blotting using an anti-Cathepsin L 
antibody as the primary antibody. 
1.4 Purification of cathepsin L-6His 

E. coli broth ( MSD 2148; NCIMB no 40773) was harvested by centrifugation at 
10 5000 rpm for 30 min in a H 6000A rotor (Sorvall Instruments). All procedures were 
carried out at 4°C except where stated. The pellet was discarded while the supernatant (10 
litres) was concentrated to 1 litre and dialysed with 15 litres of 20 mM Tris.HCl pH 7.0 in 
a spiral cartridge (Amicon™). The retentate was centrifuged at 12,000 rpm for 30 min in a 
GSA™ rotor (Sorvall Instruments). The pellet was discarded and the supernatant was 
15 further centrifuged for 35,000 rpm for 30 min in a 45 Ti rotor (Beckman Instruments). 

The pellet was discarded and the supernatant was applied to a 1 .5 cm x 3 cm Ni 2+ 
Nitrilo-tri-acetic acid-agarose column (QiaGen™) (flow rate 1 ml/min) equilibrated with 20 
mM Tris.HCl pH 7.0. The unbound protein was collected and applied to a second 1.5 cm x 
1 1 cm NTA column equilibrated with the the above buffer. Both columns were washed 
20 with 20 mM Tris.HCl pH 7.0, followed by 300 mM NaCl, 20 mM Tris.HCl pH 7.0, then 
300 mM NaCl, sodium acetate pH 6.0 and finally eluted with 500 mM imidazole, sodium 
acetate pH 6.0. The two eluates (final pH 6.3) were pooled (protein concentration 1.3 
mg/ml) and dialysed against 3x5 litres 20 mM sodium acetate pH 5.8 over 72 h. As a 
precipitate formed on dialysis, the combined eluates were centrifuged at 35,000 rpm for 30 
25 min in a 45 Ti rotor™ (Beckman). The pellet (P) fraction was collected (note this material 
resulted in the isolation of cathepsin L-6His which surprisingly could be crystallised whilst 
cathepsin from the soluble fraction could not). 

The P fraction was resuspended in 20 mM Tris.HCl pH 7.0, incubated for 48 h, 
spun at 35,000 rpm for 30 min in a 45 Ti rotor™ and then treated at pH 4 for 60 min at 
30 22°C. The solution was subsequently dialysed against 20 mM sodium acetate pH 5.5 . 
The dialysate was applied to an ion exchange column (CM Fractogel™ (E. Merck/BDH), 
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1.5 cm x 1 1 cm, flow rate 1 .0 ml/min) equilibrated with 20 mM sodium acetate pH 5.5, 
washed with 20 mM sodium acetate pH 5.5 and eluted with a gradient of 0-500 mM 
sodium chloride in 20 mM sodium acetate pH 5.5. The fractions were assayed using a Z- 
Phe-Arg-NHMec assay and the active fractions were pooled and subsequently treated at pH 

5 4 for 60 min at 22°C. The pool was frozen at - 20°C 

Samples were thawed to 4°C and dialysed against sodium acetate pH 4.4 (3x5 
litres) overnight and finally stored at - 20°C. 
1.5 Z-Phe-Arg-NHMec assay for cathepsin L-6His 

Cathepsin L-6His protein was incubated at 37°C for 30 min with the substrate 20|i 

10 M Z-Phe-Arg-NHMec in an incubation buffer (total incubation volume = 175^1). The 
reaction was stopped with 175^1 stop buffer. The sample was excited at 370 nm, and the 
emitted fluorescence was measured at 460 nm. Incubation buffer comprises: 88 mM 
KH2PO4, 12 mM Na2HP04, 1 mM EDTA (disodium salt), 0.1% Brij 35 and 1 mM 
cysteine. Stop buffer comprises: 5.75 ml glacial acetic acid and, 1 1.6 g sodium 

15 chloracetate per litre water adjusted pH to 4.3 with NaOH. Results are expressed as 
nmoles of product formed per min normalised per mg of protein present (as measured by 
the Bradford method). 

Example 2 - Crystallisation and Structure Determination 

This work was performed on mature "clipped" cathepsin L i.e. in two-chain form, 
20 1 75 amino acids and 42 amino acids, linked by a disulfide bridge with 3 residues being 
chopped off when the mature form was clipped. 

Samples of purified cathepsin L (from Example 1 .4) were thawed and concentrated 
on a Centricon C-10™ centrifuge filtration device (Amicon) using a GA-10 rotor™ 
operating at 5000 rpm at 10°C until final protein concentration as determined by Bradford 
25 method was 10 mg/ml. No crystals resulted, but initial indications for probable 
crystallisation conditions were found using the commercially available Hampton™ 
screening conditions (Jancaric, J & Kim, S.H. J. Appl. Cryst. 1991, 24, 409-41 1) on a 
Douglas Instruments Impax V ™ crystallisation robot, which prepares small drops under oil 
using the batch crystallisation method (McPherson, A. 1982. Preparation and analysis of 
30 protein crystals. John Wiley & Sons, New York). Crystallisation conditions were then 
found and optimised manually using trays of hanging drop vapour diffusion experiments 
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(McPherson, A. 1982. Preparation and analysis of protein crystals. John Wiley & Sons, 
New York). Best results were obtained when 4 uL of protein solution were dispensed onto 
siliclad coverslips and carefuly mixed with an equal volume of reservoir solution (lOOmM 
cacodylate buffer at pH6.2 containing 30% 2-methylpentan-2,4-diol). The coverslips were 
5 inverted and the drop were suspended over wells containing 950 uL of reservoir solution. 
The hanging drop experiments were equilibrated at 4°C. Wedge-shaped crystals appeared 
after 2-4 days and grew to their full size, typically 0.1 mm 3 , within 7 days. 
2.1 Data collection and processing 

The crystallisation conditions fortuitously also functioned as cryoprotectant since 
10 they contained 30% methylpentandiol (MPD). This means the crystals could be flash 
frozen in a cold nitrogen stream at 1 00°K using the a cryostream cooler (Oxford 
Cryosystems™), thereby protecting the crystal against radiation damage, which was 
considered important given the small size of the crystals. A frozen crystal diffracted X- 
rays to 2A resolution. Intensity data were recorded on a 30cm Imaging Plate 
15 (MarResearch) positioned 150mm away from the crystal. X-rays were generated by a 
rotating anode generator (Enraf-Nonius FR571™) operating at 40 kV and 90 mA, 
producing graphite monochromatised CuKa X-radiation. Images of 1° of crystal rotation 
were recorded with 30 minutes exposure. After a number of images were collected, the 
autoindexing routines in XDS software™ (Kabsch, W. 1988. Acta Cryst A21, 916-924) 
20 adapted for use with the MAR image plate were used to determine the symmetry and unit 
cell dimensions of the crystal. These were consistent with the monoclinic space group P2\ , 
with a = 46.23, b = 49.38, c = 49.25 A and p = 1 13.45°. The presence of a single molecule 
of cathepsin L in the asymmetric unit corresponds to a volume to mass ratio of Vm*2.1 
A 3 /Da, or a solvent content of about 40% in the crystal. The data collection was continued 
25 for 5 days on a single crystal, which would probably not have been possible without 

freezing the crystal. The 27496 intensity data were merged across equivalent reflections to 
give 12421 unique reflection intensities using the program MARSCALE. The overall 

r „„ W as 6.0% for all data between 1 5 and 2A, while in the highest resolution shell 

merge 

(2.4-2.0A) R m erge was * 1.9%. The intensity data were reduced to structure amplitudes 
30 using the program TRUNCATE in the CCP4 suite (CCP4, 1994, Acta Cryst D50, 760- 
763). 
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2.2 Structure s lution and refinement 

Since there is 40% amino acid identity with papain, a well-refined papain model 
was thought to be a suitable trial model for molecular replacement. Three variations of the 
trial model were created: firstly a full papain model (Brookhaven Data Bank entry 9PAP), 
5 secondly the papain model with side chains truncated to the least common demoninator 
and insertions with respect to cathepsin L deleted, and thirdly a homology model of 
cathepsin L built from the papain structure and energy minimised. 

The program AMORE as available from the CCP4 suite of protein crystallography 
programs (CCP4, 1994, Acta Cryst D5 1 , 760-763) was used for the molecular replacement. 
10 Structure factors for the trial models were calculated in a cubic cell of 80 A. The molecular 
replacement was carried out using all data between the resolution limits 10 and 4 A within a 
Patterson integration radius of 25 A. The three trial models gave Rotation function values of 
24, 23 and 21, respectively, which were over twice the value of the next highest peak in the 
rotation function maps. Following the translation function, the correlation between 
15 observed and calculated data were, for the three trial models, 0.395, 0.440 and 0.355 
respectively, while the crystallographic R-values were 46.6, 46.0 and 48.8%, for the full 
papain, trimmed papain, and homology model respectively. 

Inspection with computer graphics showed that papain molecules rotated and 
translated to these molecular replacement solutions packed well into the unit cell without 
20 any interpenetration of neighbouring molecules. The full papain and trimmed papain 
models gave equivalent solutions, while the homology model had positioned the molecule 
correctly but upside down relative to the papain models. Attempts to refine the homology 
model solution were not succesful. The trimmed papain molecule, rotated and translated 
from the Brookhaven data bank coordinates (entry 9PAP) by the eulerian angles a = 
25 235.63, p = 168.58, and y = 95.75° and by the fractional coordinates 0.160166, 0.0, - 
0.27854, was taken for further work. 

This model was refined using the program XPLOR™ and the standard simulated 
annealing protocol suggested by Bruenger (Bruenger, 1988, J.Mol. Biol. 203, 803-816). 
The refinement used all data between 10 and 2A. After the first round of refinement, R was 
30 35%, but the Rfr ee sti11 aroum* 43%, suggesting considerable model bias. Inspection of a 
2Fo-Fc electron density map showed that while part of the model fit the density well there 
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were also large portions of the molecule where the agrement with the density was poor. 
However, the solution was und ubtedly correct since exceptionally clear sidechain density 
for several aromatic amino acid residues was visible where as the model had been 
truncated to CP in these cases. 
5 As much of the electron density was interpreted as possible and those portions 

where the interpretation was not clear were omitted from the model altogether before 
refinement continued. The model consisted of 6 segments at the next round of refinement, 
A second round of simulated annealing refinement using XPLOR™ resulted in R = 37% 
and Rfree = ^ 0// °» however the electron density map now shows clearly the positions of 

10 many side chains and of some of the loops that were previously omitted, although there 
still remained some 30 residues which have not been built correctly yet. The model was 
again subjected to simulated annealing resulting in R=34% to 2A. The resultant electron 
density map now shows all missing loops. Individual isotropic atomic temperature factors 
have been applied and solvent water molecules have been added to the model. This results 

15 in an accurate structure (2A) of cathepsin-L. The coordinates of the model are set out in 
Example 4. This model has R=21% with 157 solvent water molecules included. 

Example 3: Use of the cathepsin L structure in structure-based drug design 

Having determined the coordinates for the atomic structure of cathepsin L, as well 

20 as having devised the crystallisation protocol for cathepsin L, we are now in a strong 
position to design drugs against the disease targets that involve cathepsin L, such as bone 
resorption and osteoporosis. The approaches used for this include the following. 
(1 ) We superpose our cathepsin L coordinates onto other available coordinates of thiol 
proteases which have inhibitors bound to give us a first approximation of the way these and 

25 related inhibitors might bind cathepsin. The active site of cathepsin is very similar to that 
of papain for example, as might be expected from sequence homology. However, 
inhibitors of papain such as leupeptin extend to a region of the protease where the 
homology begins to break down, and these different areas in the protein confer different 
binding modes for the inhibitors which can be modelled. We can see which parts of the 

30 inhibitors can be modified without affecting the interactions the inhibitor makes with the 
protein; it is these modifications that can be used to adjust the physical properties of the 
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inhibitor without affecting the binding. We can also predict modifications of the inhibitor 
which lead to stronger interactions with the protein; this is rational optimisation of a lead 
compound. Such predictions are made in conference with medicinal chemists who assess 
the feasibility of synthesis - this allows us to generate compounds that can be tested in 
5 activity assays and which may be complexed with the cathepsin crystals allowing rapid X- 
ray determination of the complex structure. 

(2) Using our coordinates for the cathepsin L molecule, we use a variety of available 
ligand design software to generate pharmacophore models that are specific for binding at 
the cathepsin L active site. These in turn can be used to search real and virtual compound 

10 data bases to provide a list of possible candidate inhibitors that match the pharmacophore. 
Another way to generate models of ligands is to "grow" them in the active site. The 
software used includes commercially available programs such as LUDI™ and 
LEAPFROG™. 

LUDI™ examines interactions of molecular fragments from a data base with the 
15 protein active site, and allows the user to build up a chemically realistic molecule from 
suitable fragments. 

LEAPFROG™ "grows" new ligands using a genetic algorithm, the parameters that 
direct the growth of the ligand being under the control of the user. 

Compounds proposed by the ligand design software are synthesised and tested for 
20 activity. If active, they can be complexed with crystals of cathepsin L for structural 
analysis. 

(3) An important step in the drug design cycle is the feedback of how a designed 
compound truly interacts with the protein, which can be done if a crystallisation protocol is 
in place. The crystals can be soaked in a solution of the inhibitor for 24 hours before 

25 collecting the X-ray diffraction patterns. A (complex-native) difference Fourier synthesis 
using the phases derived from the native protein crystal structure is a straight forward 
calculation which rapidly gives the electron density at atomic resolution for the inhibitor, 
showing exactly where and how it binds. Alternatively, the crystals can be grown in the 
presence of the inhibitor. The crystal structure of the inhibitor complex is useful in many 

30 ways, for example, it allows us to examine unambiguously the interactions that the 

inhibitor makes with the protein so that we can enter a new cycle of rational improvement, 
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and it also enables us to identify parts of the molecule which may be modified in order to 
alter physical proerties without compromising binding in order to improve in vivo activity. 
Importantly, it allows us to learn how well our intended optimisations resemble what 
actually occurs in the crystal, so that we may refine further our own ability to design 
5 optimised inhibitor-protein interactions. 

(4) Inhibitors found by random screening can be optimised rationally as in (3). 

(5) The inhibitors found from two different chemical series by any of the above 
methods can be combined rationally to produce hybrid compounds that incorporate the 
protein interactions seen in both series of inhibitors to maximum effect. 

10 (6) Inhibitors found using the above methods are analysed structurally to arrive at ways 
of generating diversity through the use of carefully designed chemical libraries. In this 
way, larger numbers of substituents can be tested for activity and additional compounds of 
interest can be optimised further by re-entering them in the rational design cycle. 
Inhibitors identified using the above methods are analysed further in assays 

15 including in vivo assays. 
Example 4 
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20 Example 5 

Assay of Inhibition of Cathepsin L Activity 



Potential drugs may be tested for inhibition of Cathepsin L activity as follows. 

25 Briefly, Cathepsin L (19 pM final as determined by active site titration using E-64, Barrett 
et al., 1 98 1 , Methods in Enzymology, Sfl: 535) is added to 750 uL of assay buffer 
consisting of 340 nM sodium acetate - 60 mM acetic acid (pH 5.5), 4 mM disodium 
EDTA. and 8 mM DTT. Peptides were added at concentrations ranging from on to 35 uM. 
After a 6 minute incubation at 25°C, 1 500 uL of 0.1% Brij 35 was added and the reaction 

30 started with 5.0 uM Z-Phe-Arg-NMec(AMC) (AMC is aminomethylcoumarin). The 
fluorescence of the free aminomethylcoumarin is measured in a fluorimeter at excitation of 
370 nm and an emission at 432 nm over a 6 minute time period. Fluorescence was 
converted to uM AMC released by using a standard curve generated by plotting uM AMC 
vs. fluorescence, and plotted vs. time. In order to calculate the velocities, a linear least 

35 squares analysis is performed over the initial part of the data. The IC 50 values are 
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detcrmined from Dixon plots of the form V-/V+ (velocity without compound/velocity with 
compound) vs. compound concentration. 

Example 6 

5 Assay of inhibition of proteoglycan degradation 

Cathepsins B and L have been found to degrade cartilage components, causing the 
degradation of associated proteoglycans (Maciewicz and Wotton, Biomedica Biochimica 
Acta, 5Q, 561-4, 1991). Previous results by the same researchers indicated that these 
10 enzymes are found in active form in the synovial fluid of arthritic patients. The conclusion 
drawn was that cysteine proteinases play a role in the etiology of arthritis. 

Cartilage degradation can be induced by the in vivo injection of IL-1 into the joint 
capsule of rabbits, and the administration of a large serine protease inhibitor PN-1 (43 KD) 
can ameliorate this degradation (Stevens et al., Agents and Actions, Supplements , 22, 173- 
15 7, 1993). It has been found that low molecular weight synthetic peptide metalloproteinase 
inhibitor can prevent the breakdown of proteoglycan within articular cartilage in vitro. 
(Andrews et al., Agents and Actions, 22, 147-54, 1992). Certain cysteine endopeptidase 
inactivators were found to inhibit IL-1 stimulated structural cartilage proteoglycan 
degradation. E64 and Ep475, broad-spectrum cysteine protease inhibitors) did not work at 
20 1 00 \iM concentrations. However lipophilic derivatives inhibited at 10 [iM to 1 \iM 
concentrations. (Buttle et al., Biochemical Journal, 281, 175-7, 1992). Compounds can be 
tested for the ability to inhibit articular cartilage proteoglycan degradation, as measured by 
proteoglycan release after IL- 1 stimulation. 

The assay system used for testing the peptides for inhibitory activity of 
25 proteoglycan release is a micro organ culture assay (MOCA). Papain; cetyl pyridinium 
chloride and chondroitin sulfate type C is purchased from Sigma Chemical Co. 
(St. Louis, MO). Interleukin 1 alpha is purchased from Collaborative Research. ABCase, 
chrondroitinase, ABC lyase, and keratanase is obtained from ICI. Na 35 SO< is purchased 
from NEN. 

30 Articular cartilage explants from calf knee joints are maintained in culture in 

DMEM medium containing 20 ^Ci/mi Na 35 S0 4 for 48 hours for the incorporation of label 
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into newly synthesized proteoglycan (PG). The radiolabelled medium is then removed, the 
radiolabelled explants washed 3 x 30 ml cold DMEM and placed into a 96 well plate with 
or without IL-1 (Iriterleukin-1 ajph^ 59 u/ml) and various concentrations of test 
compound. The explants are incubated first for 24 hours in the presence of IL-1 in order to 
5 ensure initiation of IL-1 induced auto-catalysis prior to the addition of various 
metalloprotease inhibitors for an additional 72 hours. 

The newly synthesized radiolabelled proteoglycans released during the cultivation 
period are subjected to enzymatic digestion with papain. Briefly, an aliquot of 150 ul of 
medium from the original culture volume (300 ul) was incubated with 100 ul of papain 

10 (3 ml/ml) for 2 hours at 65°C. A 50 ul aliquot of the papain digested material containing 
radiolabelled 35 S0 4 -gag (glycosaminoglycans, mucopolysaccharide) is incubated with 
100 ul of cetylpyridinium chloride (cpc, 4% cpc + 40 nM NaS0 4 ) plus cold chondroitin 
sulfate standard (30 ul of 2.5 mg/ml solution). The samples are placed on ice for 60 
minutes, and the radiolabelled gags are precipitated and collected on a 96 well plate 

15 harvester (MACH2. TOMTEC, Orange, CT) glass fiber filter. The filter is dried and 
counted after addition of 1 0 mis of scintillation cocktail (scintillant). 
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SBQDKNCB LISTING 



5 



10 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Zeneca Limited 

(B) STREET: 15 Stanhope Gate 

(C) CITY: London 

(E) COUNTRY: United Kingdom 

(F) POSTAL CODE (ZIP) : W1Y 6LN 

(G) TELEPHONE: (0171) 304 5000 

(H) TELEFAX: (0171) 304 5151 



15 (ii) TITLE OF INVENTION: Protein 

(iii) NUMBER OF SEQUENCES: 8 



20 



(iv> COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 

25 (vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: GB 9522660.1 

(B) FILING DATE: 04-NOV-1995 



30 (2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 



40 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 
GCAGTAAGAT ATGAATCCTA CACT 
5 (2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

15 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

20 CATCACCGTC CACAGCTCAC ACAG 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: other nucleic acid 



35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GATGACCATG GCGACTCTAA CATTTGATCA CAG 
(2) INFORMATION FOR SEQ ID NO: 4: 

40 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 9 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

5 <ii) MOLECULE TYPE: other nucleic acid 



10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 

CTGCCTGAAT TCTTCACTGG TCATGTCTC 
(2) INFORMATION FOR SEQ ID NO: 5: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 
CTATCCATAT GAGG CAACAG AAGAATCCTG 

30 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 59 base pairs 
35 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

40 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GACAAGCTTG AATTCTTATT AGTGATGGTG ATGGTGGTGC ACAGTGGGGT AG CTGGCTG 59 

5 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1575 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

20 

AGAACCGCGA CCTCCGCAAC CTTGAGCGGC ATCCGTGGAG TGCGCCTGCA GCTACGACCG 60 
CAGCAGGAAA GCGCCGCCGG CCAGGCCCAG CTGTGGCCGG ACAGGGACTG GAAGAGAGGA 
25 CGCGGTCGAG TAGGTGTGCA CCAGCCCTGG CAACGAGAGC GTCTACCCCG AACTCTGCTG 
GCCTTGAGGT GGGGAAGCCG GGGAGGGCAG TTGAGGACCC CGCGGAGGCG CGTGACTGGT 
TGAGCGGGCA GGCCAGCCTC CGAGCCGGGT GGACACAGGT TTTAAAACAT GAATCCTACA 



30 

CTCATCCTTG CTGCCTTTTG CCTGGGAATT GCCTCAGCTA CTCTAACATT TGATCACAGT 



40 

GTGTTCCAGG AACCTCTGTT TTATGAGGCC CCCAGATCTG TGGATTGGAG AGAGAAAGGC 



120 



160 



240 



300 



360 



TTAGAGGCAC AGTGGACCAA GTGGAAGGCG ATGCACAACA GATTATACGG CATGAATGAA 42 0 

35 GAAGGATGGA GGAGAGCAGT GTGGGAGAAG AACATGAAGA TGATTGAACT GCACAATCAG 
GAATACAGGG AAGGGAAACA CAGCTTCACA ATGGCCATGA ACGCCTTTGG AGACATGACC 
AGTGAAGAAT TCAGGCAGGT GATGAATGGC TTTCAAAACC GTAAGCCCAG GAAGGGGAAA 



480 
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600 



660 
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TACGTGACTC CTGTGAAGAA TCAGGGTCAG TGTGGTTCTT GTTGGGCTTT TAGTGCTACT 
GGTGCTCTTG AAGGACAGAT GTTCCGGAAA ACTGGGAGGC TTATCTCACT GAGTGAGCAG 
5 AATCTGGTAG ACTGCTCTGG GCCTCAAGGC AATGAAGGCT GCAATGGTGG CCTAATGGAT 
TATGCTTTCC AGTATGTTCA GGATAATGGA GGCCTGGACT CTGAGGAATC CTATCCATAT 
GAGGCAACAG AAGAATCCTG TAAGTACAAT CCCAAGTATT CTGTTGCTAA TGACACCGGC 

10 

TTTGTGGACA TCCCTAAGCA GGAGAAGGCC CTGATGAAGG CAGTTGCAAC TGTGGGGCCC 
ATTTCTGTTG CTATTGATGC AGGTCATGAG TCCTTCCTGT TCTATAAAGA AGGCATTTAT 
15 TTTGAGCCAG ACTGTAGCAG TGAAGACATG GATCATGGTG TGCTGGTGGT TGGCTACGGA 
TTTGAAAGCA CAGAATCAGA TAACAATAAA TATTGGCTGG TGAAGAACAG CTGGGGTGAA 
GAATGGGGCA TGGGTGGCTA CGTAAAGATG GCCAAAGACC GGAGAAACCA TTGTGGAATT 

20 

GCCTCAGCAG CCAGCTACCC CACTGTGTGA GCTGGTGGAC GGTGATGAGG AAGGACTTGA 
CTGGGGATGG CGCATGCATG GGAGGAATTC ATCTTCAGTC TACCAGCCCC CGCTGTGTCG 
25 GATACACACT CGAATCATTG AAGATCCGAG TGTGATTTGA ATTCTGTGAT ATTTTCACAC 
TGGTAAATGT TACCTCTATT TTAATTACTG CTATAAATAG GTTTATATTA TTGATTCACT 
TACTGACTTT GCATTTTCGT TTTTAAAAGG ATGTATAAAT TTTTACCTGT TTAAATAAAA 

30 

TTTAATTTCA AATGT 

(2) INFORMATION FOR SEQ ID NO: 8: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

40 

<ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



Met A*n Pro Thr Leu lie Leu Ala Ala Phe Cys Leu Gly lie Ala Ser 
1 5 10 15 

Ala Thr Leu Thr Phe Asp His Ser Leu Glu Ala Gin Trp Thr Lys Trp 
20 25 3 o 



Lys Ala Met His Asn Arg Leu Tyr Gly Met Asn Glu Glu Gly Trp Arg 
35 40 45 

Arg Ala Val Trp Glu Lys Asn Met Lys Met lie Glu Leu His Asn Gin 
50 55 60 



Glu Tyr Arg Glu Gly Lys His Ser Phe Thr Met Ala Met Asn Ala 
65 70 7S 



Phe 
SO 



Gly Asp Met Thr Ser Glu Glu Phe Arg Gin Val Met Asn Gly Phe Gin 
65 90 95 

Asn Arg Lys Pro Arg Lys Gly Lys Val Phe Gin Glu Pro Leu Phe Tyr 
100 105 110 

Glu Ala Pro Arg Ser Val Asp Trp Arg Glu Lys Gly Tyr Val Thr Pro 
US 120 125 



Val Lys Asn Gin Gly Gin Cys Gly Ser Cys Trp Ala Phe Ser Ala Thr 
130 135 140 

Gly Ala Leu Glu Gly Gin Met Phe Arg Lys Thr Gly Arg Leu lie Ser 

Leu Ser Glu Gin Asn Leu Val Asp Cys Ser Gly Pro Gin Gly Asn Glu 
165 "0 17S 

Gly Cys Asn Gly Gly Leu Met Asp Tyr Ala Phe Gin Tyr Val Gin Asp 
180 "5 190 

Asn Gly Gly Leu Asp Ser Glu Glu Ser Tyr Pro Tyr Glu Ala Thr Glu 
195 200 205 



Glu Ser Cys Lys Tyr Asn Pro Lys Tyr Ser Val Ala Asn Asp Thr 



Gly 
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210 

Phe Val Asp He 
225 

Thr Val Gly Pro 

Leu Phe Tyr Lys 
260 

Asp Met Asp His 
275 

Glu Ser Asp Asn 
290 

Glu Trp Gly Met 
305 

His Cys Gly He 



215 

Pro Lys Gin Glu 
230 

He Ser Val Ala 
245 

Glu Gly He Tyr 

Gly Val Leu Val 
280 

Asn Lys Tyr Trp 
295 

Gly Gly Tyr Val 
310 

Ala Ser Ala Ala 
32S 



220 

Lys Ala Leu Met 
235 

He Asp Ala Gly 
250 

Phe Glu Pro Asp 
265 

Val Gly Tyr Gly 

Leu Val Lys Asn 
300 

Lys Met Ala Lys 
315 

Ser Tyr Pro Thr 
330 



Lys Ala Val Ala 
240 

His Glu Ser Phe 
255 

Cys Ser Ser Glu 
270 

Phe Glu Ser Thr 
285 

Ser Trp Gly Glu 



Asp Arg Arg Asn 
320 

Val 
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1 . Cathepsin L at a specific activity of at least 40,000 nmoles/min/mg. 

2. Cathepsin L at a specific activity of at least 1 00,000 nmoles/min/mg. 
5 3. Cathepsin L in crystalline form. 

4. Cathepsin L according to claim 3 in which the crystals possess a monoclinic space 
group P2\ , with unit cell dimensions a = 46.23, b = 49.38, c = 49.25 A and p = 1 13.45°. 

5. Cathepsin L according to claim 4 in which the crystals are wedge-shaped crystals 
having the atomic coordinates set out in Example 4. 

106. A secretion vector for directed inducible expression of cathepsin L into the 
periplasm of E.coli. 

7. An E.coli host comprising a secretion vector for directed inducible expression of 
cathepsin L into the host periplasm. 

8. A method of making cathepsin L which comprises culture of an E.coli host as 

15 defined in claim 7 in a growth medium and optionally soluble cathepsin L is collected from 
the growth medium. 

9. A method of making cathepsin L according to claim 8 in which the E.coli host is 
cultured at a temperature of 15-30°C. 

10. A method of making cathepsin L according to claim 8 in which the E.coli host is 
20 cultured at a temperature of about 25°C. 

11. A method of making cathepsin L according to claim 8 in which the E.coli host is 
cultured at a temperature of about 25°C in the presence of a concentration of inducer 
optimised for maximal soluble expression of cathepsin. 

12. K coli USD 2148 (MSD 213 pZen 1677) deposited under NCIMB accession no 
25 40773. 

13. The X-ray coordinates of cathepsin L as defined in Example 4. 

14. The X-ray coordinates of the active site of cathepsin L as defined in Example 4 in 
residues 17-26, 29, 61-75, 1 1 1-1 18, 132-145, 158-166, 183-189 & 209-214. 

15. A method of rational drug design using a 3-dimensional structure of cathepsin L as 
30 determined from X-ray diffraction data which comprises any of the following: 
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searching real and virtual compound databases for potential drugs; computational growth 
of potential ligands in the active site; or collecting X-ray diffraction data from crystals 
comprising potential drug. 

16. A novel inhibitor of cathepsin L determined by rational drug design according to 
5 the method of claim 15. 
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